JOPSS - Search Results

Search Results: Records 1-2 displayed on this page of 2

Presentation/Publication Type

Initialising ...

Refine

Journal/Book Title

Initialising ...

Meeting title

Initialising ...

First Author

Initialising ...

Keyword

Initialising ...

Language

Initialising ...

Publication Year

Initialising ...

Held year of conference

Initialising ...

Oral presentation

Implementations about multiple GPU computation of lattice Boltzmann method with adaptive mesh refinement

Hasegawa, Yuta

no journal, ,

To realize the large-scale LES simulation for the aerodynamics of complex shape bodies and the local wind analysis of urban areas, multiple GPU computation of the lattice Boltzmann method (LBM) with adaptive mesh refinement has been implemented. In this presentation, we will explain optimization techniques for the developed code such as single GPU optimization, an optimization of MPI communication, and a spacial parallel implementation for intra-node multiple GPU computation on the latest GPU platforms.

Oral presentation

Enhancing intra-node Multi-GPU stencil calculations on DGX-2 using concurrent-addressing with Unified Memory

Hasegawa, Yuta; Onodera, Naoyuki; Idomura, Yasuhiro

no journal, ,

In the "CityLBM" project at JAEA, a real-time AMR (adaptive mesh refinement)-based urban wind prediction code was developed. Towards the next generation of CityLBM code, ensemble simulations are needed to improve the reliability of the prediction. For this purpose, the memory usage should be shrunk into a single node or 4-16 GPUs per simulation. To reduce the memory usage and accelerate data communication in the AMR code, we tried an intra-node multi-GPU implementation using Unified Memory in CUDA. This approach enables easy parallel GPU implementation, because the access to Unified Memory is automatically managed via HBM2 (self GPU) or NVLink (neighbor GPU). We implemented multi-GPU calculations for a 3D diffusion equation and a lattice Boltzmann equation on uniform mesh, and tested weak/strong scalability and the performance of NVLink.